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We introduce a new procedure to construct weight factors, which flatten the probability density of 
the overlap with respect to some pre-defined reference configuration. This allows one to overcome 
free energy barriers in the overlap variable. Subsequently, we generalize the approach to deal with the 
overlaps with respect to two reference configurations so that transitions between them are induced. 
We illustrate our approach by simulations of the brainpeptide Met-enkephalin with the ECEPP/2 
energy function using the global-energy-minimum and the second lowest-energy states as reference 
configurations. The free energy is obtained as functions of the dihedral and the root-mean-square 
distances from these two configurations. The latter allows one to identify the transition state and 
to estimate its associated free energy barrier. 
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I. INTRODUCTION 

Markov chain Monte Carlo (MC) simulations, for in- 
stance by means of the Metropolis method [1], are well 
suited to simulate generalized ensembles. Generalized en- 
sembles do not occur in nature, but are of relevance for 
computer simulations (see [2-4] for recent reviews) . They 
may be designed to overcome free energy barriers, which 
are encountered in Metropolis simulations of the Gibbs- 
Boltzmann canonical ensemble. Generalized ensembles 
do still allow for rigorous estimates of the canonical ex- 
pectation values, because the ratios between their weight 
factors and the canonical Gibbs-Boltzmann weights are 
exactly known. 

Umbrella sampling [5] was one of the earliest 
generalized-ensemble algorithms. In the multicanonical 
approach [6,7] one weights with a microcanonical tem- 
perature, which corresponds, in a selected energy range, 
to a working estimate of the inverse density of states. Ex- 
pectation values of the canonical ensembles can be con- 
structed for a wide temperature range, hence the name 
"multicanonical" . Here, "working estimate" means that 
running the updating procedure with the (fixed) multi- 
canonical weight factors covers the desired energy range. 
The Markov process exhibits random walk behavior and 
moves in cycles from the maximum (or above) to the 
minimum (or below) of the chosen energy range, and 



* Present address: Theory II, Institute of Solid State Re- 
search, Forschungszentrum Julich, D-52425 Julich, Germany. 
E-mail : hi . noguchi® fz-j uelich . de . 



back. A working estimate of the multicanonical weights 
allows for calculations of the spectral density and all re- 
lated thermodynamical observables with any desired ac- 
curacy by simply increasing the MC statistics. Thus, we 
have a two-step approach: The first step is to obtain 
the working estimate of the weights, and the second step 
is to perform a long production run with these weights. 
There is no need for that estimate to converge towards 
the exact inverse spectral density. Once the working es- 
timate of the weights exists, MC simulations with frozen 
weights converge and allow one to calculate thermody- 
namical observables with, in principle, arbitrary preci- 
sion. Various methods, ranging from finite-size scaling 
estimates [8] in case of suitable systems to general pur- 
pose recursions [9-11], are at our disposal to obtain a 
working estimate of the weights. 

In the present article we deal with a variant of the mul- 
ticanonical approach: Instead of flattening the energy 
distribution, we construct weights to flatten the proba- 
bility density of the overlap with a given reference con- 
figuration. This allows one to overcome energy barriers 
in the overlap variable and to get accurate estimates of 
thermodynamic observables at overlap values which are 
rare in the canonical ensemble. A similar concept was 
previously used in spin glass simulations [12], but there 
is a crucial difference: In Ref. [12] the weighting was 
done for the self-overlap of two replicas of the system 
and a proper name would be multi-self-overlap simula- 
tions, while in the present article we are dealing with the 
overlap to a predefined configuration. 

We next generalize our approach to deal with two ref- 
erence configurations so that transitions between them 
become covered and our method allows one then to esti- 
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mate the transition states and its associated free energy 
barrier. We have in mind situations where experimen- 
talists determined the reference configurations and ob- 
served transitions between them, but an understanding 
of the free energy landscape between the configurations 
is missing. An example would be the conversion from a 
configuration with a helix structures to a native struc- 
ture which is mostly in the (3 sheet, as it is the case for 
/3-lactoglobulin [13,14]. 

The paper is organized as follows: In the next sec- 
tion we describe the algorithmic details, using first one 
and then two reference configurations. In particular, a 
two-step updating procedure is defined, which is typi- 
cally more efficient than the conventional one-step up- 
dating. Moreover, based on the sums of uniformly dis- 
tributed random numbers, a method to obtain a working 
estimate of the multi-overlap weights is introduced. In 
section III we illustrate the method for a simulation with 
the pentapeptide Met-enkcphalin. Our simulations use 
the all-atom energy function ECEPP/2 (Empirical Con- 
formational Energy Program for Peptides [15]) and rely 
on its implementation in the computer package SMMP 
(Simple Molecular Mechanics for Proteins [16]). We use 
as reference configurations the global energy minimum 
(GEM) state, which has been determined by many au- 
thors [17-21], and the second lowest-energy state, as 
identified in Refs. [19,22]. While our overlap definition 
relies on a distance definition in the space of the dihe- 
dral angles, it turns out that for the data analysis the 
use of the root-mean-square (rms) distance is crucial. It 
is only in the latter variable that one obtains a clear pic- 
ture of the transition saddle point in the two-dimensional 
free energy diagram. In the final section a summary of 
the present results and an outlook with respect to future 
applications are given. 

II. MULTI-OVERLAP METROPOLIS 
ALGORITHM 

In this section we explain the details of our multi- 
overlap algorithm. The overlap of a configuration versus 
a reference configuration is defined in the next subsec- 
tion. In the second subsection we discuss details of the 
updating. To achieve step one of the method, i.e., the 
construction of a working estimate of the multi-overlap 
weights, one could employ a similar recursion as the one 
used in [12] or explore the approach of [11]. Instead of 
doing so, we decided to test a new method: At infinite 
temperature, (3 — 0, the overlap distributions can be cal- 
culated analytically (see subsection II D). We use this 
as starting point and estimate the overlap weights at the 
desired temperature by increasing (3 in sufficiently small 
steps so that the entire overlap range remains covered. 
In the final subsection we define the overlap with respect 
to two distinct reference configurations to cover the tran- 
sition region between them. 



A. Definition of the overlap 

There is a considerable amount of freedom in defining 
the overlap of two configurations. For instance, one may 
rely on the rms distance between configurations, and in 
subsection HID we analyze some of our results in this 
variable. However, the computation of the rms distance 
is slow and for MC calculations it is important to rely on 
a computationally fast definition. Therefore, we define 
the overlap in the space of dihedral angles by, as it was 
already used in [24], 

q = (n- d)/n , (1) 

where n is the number of dihedral angles and d is the 
distance between configurations defined by 

1 ™ 

d = Wv-v'W = -Y^daiviM) ■ (2) 

Here, Vi is our generic notation for the dihedral angle i, 
—it < Vi < tt, and v 1 is the vector of dihedral angles of the 
reference configuration. The distance d a (vi,vl) between 
two angles is defined by 

d a (vi,vl) = min(\vi - v'i\,2w - \vi - v'i\) . (3) 

The symbol ||.|| defines a norm in a vector space. In 
particular, the triangle inequality holds 

Wv 1 - v 2 \\ < Wv 1 - v\\ + \\v - w 2 || . (4) 
For a single angle we have 

< \vi - vj | < tt < d < n . (5) 
At (3 = (i.e., infinite temperature) 

d t = -d a {vi,v}) (6) 

TT 

is a uniformly distributed random variable in the range 
< di < 1 and the distance d in (2) becomes the sum 
of n such uniformly distributed random variables, which 
allows for an exact calculation of its distribution. 



B. Multi-overlap weights 

We choose a reference configuration of n dihedral an- 
gles vj, (i — 1, . . . , n) to define the dihedral distance (2). 
We want to simulate the system with weight factors that 
lead to a random walk (RW) process in the dihedral dis- 
tance d, 

d < rfmin — > d > d max and back . (7) 

Here, <i m i n is chosen sufficiently small so that one can 
claim that the reference configuration has been reached, 
e.g., a few percent of n/2, which is the average d at 
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T = oo. The value of c? max has to be sufficiently large 
to introduce a considerable amount of disorder, e.g., 
d mal = n/2. In the following we call one event of the 
form (7) a random walk cycle (RWC). 

One possibility is to choose weight factors which give 
a flat probability density in the dihedral distance range 
< d < n/2, falling off for d > n/2 by keeping the in- 
dependence of the weight constant for d > n/2. This 
is quite similar to multimagnetical simulations [8], for 
which the external magnetic field takes the place of the 
reference configuration. The analogy becomes obvious, 
when the external field is defined via a ghost spin, which 
couples to all other spins. For instance, the spins s of the 
Heisenberg ferromagnct arc three-dimensional vectors of 
magnitude s 2 = 1. Their interaction with an external 
magnetic field H can be written as 

H-J2si = Hj2s H -Si = NHq , (8) 

i i 

where Sh is the unit vector in the direction of the mag- 
netic field, Si is the Heisenberg spin at site i, N is the 
number of spins, and q is the overlap of the spin config- 
uration with the reference configuration sh- 

i 

Using the multi-overlap language [12], the multi-magneti- 
cal [8] weight factors may then be re-written as 

exp (-0E + S(q)) = w c {E) w q (g) , (10) 

where 

w c {E) = exp(— , (11) 

and E = — J2{ij) ' 8j is energy function of the Heisen- 
berg ferromagnet (the sum is over nearest neighbor 
spins). Here, S(q) has the meaning of a microcanonical 
entropy of the overlap parameter, which has to be de- 
termined so that the probability density becomes flat in 
q. Weights for other than the flat distribution have also 
been discussed in the literature, e.g., Ref. [25], on which 
we shall comment in connection with figure 7 below. 

C. The updating procedure 

In essence, there are two ways to implement the up- 
date. 

1. Combine the multi-overlap and the canonical 
weights to one probability, which is accepted or re- 
jected in one random step. 

2. Accept or reject the multi-overlap and the canoni- 
cal probabilities sequentially in two random steps. 



1. One-step updating 

As defined in equations (10) and (11), the weight fac- 
tor is a product of w c (E) and w q (d), where w c (E) is the 
usual, canonical Gibbs-Boltzmann factor and w q (d) is the 
multi-overlap weight factor, where we now use the dis- 
tance d from the reference configuration (instead of the 
overlap q) as argument. As is clear from equation (1), 
the use of either q or d as argument is equivalent, while 
in the presentation of results the use of either variable 
can have intuitive advantages. In the one-step updating 
we combine the weights to 

w(E,d) = w c (E)w q (d) , (12) 

and accept or reject newly proposed configurations in 
the standard Metropolis way. Notably, the calculation 
of w q (d) (a simple table lookup) is very fast compared 
with the calculation of w c (E). Therefore, the following 
two-step procedure is of interest. 

2. Two-step updating 

Suppose that the present configuration is (d, E) and a 
new configuration (d',E') is proposed: 

(d,E) - (d',E') . (13) 

We can sequentially first accept or reject with the w q (d) 
probabilities and then conditionally, when the d-part is 
accepted, with the w c (E) probabilities. 

Proof: We show detailed balance for two subsequent 
updates of the same dihedral angle with the two-step 
procedure. There are four cases with probabilities of ac- 
ceptance: 

Pi, i = 1,2,3,4. (14) 
They are listed in the following: 



1. 


w q {d') > w q (d) and 


w c (E') > w c (E) : 










(15) 


2. 


w q (d') > w q {d) and 


w c (E') < w c (E) : 






P2 = w c (E')/w c (E), 




(16) 


3. 


w q (d') < w q (d) and 


w c (E') > w c (E) : 






P 3 = Wq(d')/W q (d), 




(17) 


4. 


w q (d!) < w q (d) and 


w c (E') < w c (E) : 






P 4 = w q (d')w c (E')/[w q (d)w c (E)}. 


(18) 



For the inverse move 

(d',E') - (d,E) (19) 
with probabilities of acceptance 

Pi i = 1,2,3,4, (20) 
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the cases are: 



1. w q (d) < w q {d') and w c {E) < w c {E') : 

P[ = w q (d) w c (E)/[w q (d') w c (E% (21) 

2. w q (d) < w q {d') and w c (E) > w c (E') : 

P^ = w q (d)/w q (d'), (22) 

3. w q (d) > w q {d') and w c (E) < w c {E') : 

= w c (E)/w c (E'), (23) 

4. w q (d) > w q (d') and w c (E) > w c (E') : 

(24) 



Pi = l. 



For the ratios we find 



P t w q (d')w c (E') 



P' 



w q {d) w c {E) 



(25) 



independently of i = 1,2,3,4. Therefore, we have con- 
structed a valid Metropolis updating procedure. 



D. Sums of a uniformly distributed random variable 

To calculate the overlap weights at infinite tempera- 
ture, we consider the sum 



x-i -\- . . . -\- x r 



(26) 



of the random variables x'j (j = 1, • • • , n), each uniformly 
distributed in the interval [0, 1) and derive a recursion 
formula for the probability density f n (u) of this distribu- 
tion. Care is taken to cast the recursion in a form which 
allows for a numerically stable implementation [26] over 
a reasonably large range of n. 

Let us recall the probability density of the uniform 
distribution: 



h{x) 



1, for < x < 1, 
0, otherwise. 



(27) 



To derive the recursion formula for the probability den- 
sity of the random variable (26), it is convenient to cast 
it in the form 



fn(u) = ^2f n ,k(xk) with x k =u-k + l, 
where 



(28) 



k=l 



fn,k{ X ) = < 



a n,k x \ f or < X < 1, 



i=0 

v 0, otherwise. 



(29) 



The master formula for the recursion is obtained from 
the convolution 



fn(u) 



h(u - v) f n -i{v) dv . 



(30) 



Let now u = x+fc — 1 with < x < 1, and equations (27), 
(28), and (29) imply 



pK — l-\-X 

fn,k{x) = I fn-l{v) dv 

Jk-2+x 

= / fn-i,k-i{y) dy + fn-i,k{y) dy . (31) 

Jx JO 

Using equation (29) and performing the integrations, we 
obtain 

ri-2 ^ n-2 xi+1 

fn,k(x) =^2a l n _ l k _ 1 -—-j - Y a n-l,k-l — 
i=0 i=0 



i=0 
n-2 



(32) 



i=0 



Expanding in powers of x and comparing (29) with (32) 
allows one to calculate the coefficients a % n k recursively in 
a numerically robust way: 



n-l „3 



n-l „j 



_ "n-l.fc-l i _ n - 1 ' k n-l.k-1 



3=0 



3=0 



.7 + 1 



(33) 



Once the coefficients a l n k are available, one can easily 
evaluate the probability densities f n (u) and the corre- 
sponding cumulative distribution functions. 

The probability density (28) takes its maximum value 
for u = n/2. Due to the central limit theorem the fall-off 
behavior is Gaussian as long as u stays sufficiently close 
to n/2. In the tails, for u — > or u —> n, the fall-off is 
much faster than Gaussian, namely an exponential of an 
exponential as follows from extreme value statistics [27]. 



E. Combination of two weights 

In the following the weights with superscript j, w q (dj), 
correspond to two distinct reference configurations w- J , 
(j = 1,2), and dj is the distance from the configura- 
tion at hand to the configuration u J . Let us assume 
that multi-overlap simulations with respect to the two 
reference configurations have been carried out and that 
the weights, w q (di) and w^efe), have been determined 
so that they sample their distance distributions approxi- 
mately uniformly. 

We want to construct combined weights w q 2 (di,d 2 ) 
which lead to a RW process between the configurations 
v 1 and v 2 . Our choice is 



w (di,d 2 ) = 



w q (di), for di < d 2 , 
Cj w q (d2), for di > d 2 . 



(34) 



The constant Cj, with j either 1 or 2, is introduced to 
allow for smooth transitions from d\ < d 2 to d[ > dl 2 
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FIG. 1. Reference configuration 1. Only backbone struc- 
ture is shown. The N-terminus is on the left-hand side and FIG. 2. Reference configuration 2. See the caption of figure 
the C-terminus on the right-hand side. The dotted lines stand 1 for details, 
for hydrogen bonds. The figure was created with RasMol [23] 



and vice versa. We determine Cj from the analysis 
of either run 1 (or run 2), which are the (one refer- 
ence configuration) simulations leading to the weights 
Wg(d\) (or Wg(d 2 )). The constant ci is found from run 
1 by scanning the time series for configuration for which 
d\ > d 2 holds and which have a one-update transition 
(di,d 2 ) — > (d^d^) with d[ < d' 2 . From these configura- 
tions k we determine the constant C\ so that 

k k 

holds. Similarly run 2 may be used to get c 2 . It turns out 
that the normalized weights almost agree in the transi- 
tion region and, therefore, the patching (34) works. The 
dependence of the constant on the run used for its de- 
termination is small, and it appears not worthwhile to 
explore more sophisticated methods. 

It is straightforward to implement the Metropolis up- 
dating with respect to the weights (34). For the transi- 
tion 

(di.da) - (d[,d' 2 ), (36) 
one has to distinguish four more cases: 

1. di < d 2 and d[ < d 2 , (37) 

2. di < d 2 and d[ > d 2 , (38) 

3. di > d 2 and d[ < d 2 , (39) 

4. di > d 2 and d[ > d 2 . (40) 



Alternatively to the approach outlined, one may com- 
bine d\ and d 2 into a new variable 6d for which the 
weights are then calculated as in the one-dimensional 
case. A suitable choice along this line is 

9 d = - arctan ( ^ ] . (41) 

7T V«2/ 



III. MET-ENKEPHALIN SIMULATIONS 

In the following we introduce two reference configura- 
tions. Subsequently, we discuss first the results for sim- 
ulations with one reference configuration and then those 
involving both reference configurations. 

A. The reference configurations 

Met-enkephalin has the amino-acid sequence Tyr-Gly- 
Gly-Phe-Met. We fix the peptide-bond dihedral angles u> 
to 180°, which implies that the total number of variable 
dihedral angles is n = 19. We neglect the solvent effects 
as in previous works. The low-energy configurations of 
Met-enkephalin in the gas phase have been classified into 
several groups of similar structures [19,22] . Two reference 
configurations, called configuration 1 and configuration 2, 
are used in the following and depicted in figures 1 and 2, 
respectively. Configuration 1 has a /3-turn structure with 
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TABLE I. Met-enkephalin reference configurations. The 
columns GEM m i n and B m i n correspond to configuration f and 
configuration 2, respectively. 



Residue 


Angle 


GEM [21] 


GEM min 


B [19] 


B m in 


1 


Xi 


-179.9 


-179.8 


-179 


+179.4 


1 


X2 


-111.3 


-111.4 


-95 


-94.3 


1 


X6 


+145.3 


+145.3 


+169 


-179.9 


1 




-86.4 


-86.3 


+111 


+ 55.7 


2 




+153.7 


+153.7 


+157 


+157.6 


2 




-161.6 


-161.5 


-71 


- 70.7 


3 


i> 


+ 71.2 


+ 71.1 


+ 78 


+ 78.0 


3 


<t> 


+ 64.1 


+ 64.1 


159 


+156.5 


4 


i> 


-93.5 


-93.5 


-37 


-35.7 


4 


Xi 


+179.8 


+179.8 


+ 59 


+ 55.3 


4 


X2 


+ 80.0 


+ 80.0 


+ 87 


+ 86.8 


4 




-81.7 


-81.7 


-154 


-155.7 


5 




- 29.2 


- 29.2 


+151 


+151.6 


5 


Xi 


-65.1 


-65.1 


-68 


-69.4 


5 


X2 


-179.2 


-179.2 


+177 


-176.3 


5 


X3 


-179.3 


-179.3 


-179 


-179.7 


5 


X4 


-60.0 


-59.9 


+ 60 


+ 59.9 


5 




-80.8 


-80.7 


-140 


-140.0 


5 


V't 


+143.9 


+143.5 


-29 


-30.6 




0123456789 
di 



FIG. 3. Weight estimates from simulations with reference 
configuration 1. From up to down the weight functions cor- 
respond to the following temperatures: 230 K, 300 K, 400 K, 
700 K, 2, 000 K, 10, 000 K, 100, 000 K and infinity (13 = 0). 



B. Simulations with one reference configuration 



hydrogen bonds between Gly-2 and Met-5, and configu- 
ration 2 a /3-turn with a hydrogen bond between Tyr-1 
and Phc-4 [22]. 

For our present work the two reference configurations 
were obtained by minimizing the GEM and the second 
lowest energy state of previous literature with respect 
to the ECEPP/2 energy function. The minimization was 
performed with the SMMP minimizcr [16] and by quench- 
ing. Both methods gave identical final energies. In table I 
we list the variable dihedral angles of the configurations 
before and after this minimization. The initial dihedral 
angles for the GEM are taken from table 1 of Ref. [21] 
and the initial dihedral angles for the second lowest en- 
ergy state B are from table 1 of Ref. [19]. In table I we 
give the angles in degrees, while for the MC simulations 
radians were used as in equations (1) and (2) for the 
overlap. Our labeling of the residues follows the SMMP 
convention and deviates from those of Refs. [21,19]. 

The distance between the two minimized configura- 
tions is d = 6.62 (q = 0.652) and their energies are given 
in table II. 



TABLE II. Energies (in kcal/mol) of the Met-enkephalin 
reference configurations 1 and 2. 





Total 


Coulomb 


Lennard- Jones 


H-Bond 


Torsion 


1 


-10.72 


+21.41 


-27.10 


-6.21 


+ 1.19 


2 


-8.42 


+22.59 


-26.38 


-4.85 


+0.23 



Each of our multi-overlap simulations at fixed temper- 
ature relies on a statistics of 16,777,216 sweeps for which 
data are recorded in a time series of 524,288 events, i.e., 
with a stepsize of 32 sweeps. We started most of our sim- 
ulations with the GEM configuration, but some random 
starts were also performed and no noticeable differences 
were encountered. 

Starting with the analytical result (28), valid at j3 = 0, 
the weights are calculated by increasing (3 (i.e., decreas- 
ing the temperature) between simulations slowly so that 
the RW of each simulation still covers the desired overlap 
range when using the weight estimates from the previous 
temperature. Discretization errors due to histograming 
can be severe and instead of weights which are piecewise 
constant within each one histogram interval, we used the 
interpolation of Ref. [6] : 

In w(d) = (1 — a) \nw(di) + a\nw(di + i) , for rf, < d < d i+ i , 

(42) 

where 



Figure 3 depicts the thus obtained weight function es- 
timates from simulations with reference configuration 1. 
After five simulations we arrive at the physical tempera- 
ture T — 300 K. The same iteration works with reference 
configuration 2. 

For the values d m i n = 0.025 n and rf max = 0.495 n, 
where n = 19 is the number of angels in (2), we list 
in table III the number of RWCs (7) achieved at each 
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temperature. We also list the CPU time ratios for the 
1-step versus the 2-step updating procedures, which we 
discussed in the previous section. Especially at high tem- 
peratures, which are needed in our approach, the 2-step 
updating turns out to be more efficient than the 1-step 
updating and all of our production runs were done with 
it. 



TABLE III. Number of random walk cycles in the simula- 
tions with our two reference configurations. The last column 
lists the CPU time ratios for 1-step versus 2-step updating. 



T 


Configuration 1 


Configuration 2 


l-step/2-step 


100, 000 K 


9,458 


9,514 


3.0 


10, 000 K 


3,122 


3,149 


1.8 


2,000K 


2,893 


2,741 


1.6 


700 K 


2,169 


2,227 


1.5 


400 K 


1,342 


1,693 


1.3 


300 K 


462 


610 


1.2 


230 K 


46 


41 


1.2 







T=230K 






T=300K 






T=400K 


- 


i\ M M 


T=700K 




\ 





-10 -5 5 10 15 20 25 30 
E (kcal/mol) 

FIG. 4. Canonical, peaked energy distributions obtained 
by re-weighting multi-overlap simulations. From left to right 
the temperatures used are: 230 K, 300 K, 400 K, and 700 K. 



We next rely on the peaked distribution function [26] 
to visualize some of the data kept in the time series of 
our simulations. The peaked distribution function of a 
probability density f(x) is defined by 

, . _ J F(x) for x < 0.5 , 
iWkedW - \ 1 - F(x) for x > 0.5 , [ ' 

where 

F(x) = f dx' f(x') (45) 

J — oo 

is the usual cumulative distribution function (see for in- 
stance [28]). 

To visualize how the canonical energy distribution 
moves when we lower the temperature, we plot in fig- 
ure 4 the peaked energy distributions as obtained by re- 
weighting some of the multi-overlap simulations of fig- 
ure 3 to the canonical ensemble of their simulation tem- 
perature. Due to the re-weighting the distributions look 
precisely as one expects for energies from canonical MC 
simulations. In contrast to conventional canonical simu- 
lations, the raw data feature a considerably larger num- 
ber of events at low energies. This is illustrated in fig- 
ure 5, where we plot the 300 K and 400 K peaked dis- 
tribution functions of figure 4 together with their raw 
multi-overlap peaked distributions 

In figure 6 we give an example of the probability den- 
sity of the distance. For the 400 K simulation with refer- 
ence configuration 1 we plot the probability density of d\ 
as obtained from the multi-overlap simulation together 
with its canonically rc-wcighted probability density. The 
simulation itself is run with the multi-overlap weights 
from the 700 K simulations and the multi-overlap his- 
togram shown is re- weighted to the multi-overlap 400 K 
weights. As expected, we have a flat distribution between 



and n/2 = 9.5. Moreover, there is a good coverage of 
configurations close to the GEM, which are highly sup- 
pressed in the 400 K canonical ensemble. The maximum 
ratio of the multi-overlap density divided by the canoni- 
cal density is 6 x 10 16 in this plot. 

For the same simulation figure 7 depicts separately the 
peaked distribution function of the forward and back- 
ward RWCs (7). A considerable asymmetry is noticeable 
and it turns out that the weights of the 1 /k ensemble [25] 
lead to more RWCs than the flat distribution of figure 6. 
In connection with our simulations this is a lucky cir- 
cumstance, because the 1/k distribution of weights is in 
essence the distribution at a somewhat higher temper- 
ature than that of the simulation. This increases the 
flexibility when estimating good weights at a lower tem- 
perature from the already existing simulation results at 
a higher temperature. 

For multi-overlap simulations the re- weighting towards 
low temperatures can work much better than for canon- 
ical simulations. This is due to the fact that the low- 
energy configurations close to low-energy reference con- 
figuration are already in the ensemble. This is illus- 
trated in figure 8, where we re- weight the data from a 
multi-overlap simulation with reference configuration 1 at 
T = 300 K and compare with a conventional multicanon- 
ical simulation based on the SMMP package [16]. The 
specific heat Cy and the derivative of the overlap with 
respect to the temperature are shown. From 200 K to 
400 K the deviations of the results are of the order of the 
statistical errors, which are not shown for clarity of the 
figure. Below 200 K deviations of the re-weighted overlap 
simulation from the correct behavior become visible, first 
in ^yr then in Cy. Such deviations are expected as the 
low-energy attractor does not lead to a uniform cover- 
age of all low-energy states. The successful re-weighting 
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FIG. 5. Peaked multi-overlap (left-shifted) and canonical 
energy distributions at T = 300 K and T = 400 K. 
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FIG. 6. Probability density of the distance from a 
multi-overlap simulation at T = 400 K (flat) and its canoni- 
cally re- weighted probability density (peaked). 



from high simulation temperatures to lower temperatures 
is an improvement, because the Metropolis dynamics at 
high temperatures is faster. But the re-weighting of a 
multi-overlap simulation to a lower temperature will fail 
at some point, because the reference configuration in- 
troduces a bias towards particular low-energy configura- 
tions. 

The temperature at which Cy and —-3^ take peak val- 
ues correspond to the coil-globule transition temperature 
Tg and the folding temperature Tf [24] . From figure 8 we 
read off the following approximate values: 



T g = 280 K and T f = 245 K 



(46) 



C. Simulations with two reference configurations 

At 300 K we combine the weights from the runs with 
reference configurations 1 and 2 to one weight function 
according to our equation (34). We record now three 
different RWCs: 

1. With respect to reference configuration 1 from d m ; n 
to ti max and back, found 315 times. 

2. With respect to reference configuration 2 from d m i„ 
to d max and back, found 545 times. 

3. From d min of reference configuration 1 to ti m ; n 
of reference configuration 2 and back, found 196 
times. 

In figure 9 we show the probability densities of this sim- 
ulation with respect to the distances from our reference 
configurations. They are no longer flat, but a satisfactory 
coverage in the variables d\ and di is still achieved. Note 
that both probability densities have peaks at d = 6.62, 



which is the distance between configurations 1 and 2. 
This implies that both reference configurations have been 
visited with high probability. 



D. Physics results 

We would like to analyze the transitions between our 
two reference configurations in some detail. For this pur- 
pose we use the rms distance, which is defined by 



ii rms = min 



\ 



1 N 



N 



(47) 



where N is the number of atoms, {x-} are the coordi- 
nates of the reference configuration j, and the minimiza- 
tion is over the translations and rotations of the coordi- 
nates of the configuration {xi}. 

The distance (2) and the rms distance (47) are quite 
distinct. The reason is that a change of a single dihedral 
angle in the central parts of the molecule can cause a large 
deviation in the rms distance. Although the two config- 
urations are then close-by from the point of view of the 
MC algorithm, physically they are rather far apart, as the 
similarity of the three-dimensional structures is governed 
by the rms distance. Therefore, the rms distance distri- 
bution deviates considerably from the dihedral distance 
distribution. We illustrate this by plotting in figure 10 
the rms probability density of the 400 K simulation for 
which the dihedral distance probability density is shown 
in figure 6. 

We now analyze the free-energy landscape [29] from the 
results of our simulation with combined weights at 300 K 
in some detail. We study the landscape with respect to 
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FIG. 7. Peaked distribution functions for the forward 
(d — > d m ax) and backward (d — ► dmin) parts of the random 
walk cycles from a multi-overlap simulation at T = 400 K. 



some reaction coordinates (and hence it should be called 
the potential of mean force). In order to study the tran- 
sition states between reference configurations 1 and 2, 
we first plotted the free-energy landscape with respect 
to the distances d\ and c?2 ■ However, we did not observe 
any transition saddle point. A satisfactory analysis of the 
saddle point becomes possible when the rms distance (in- 
stead of the dihedral distance) is used. Figure 11 shows 
contour lines of the free energy re-weighted to T = 250 
K, which is close to the folding temperature (46). Here, 
the free energy F(rmsl,rms2) is defined by 

F(rmsl 1 rms2) = — fc^Tln P(rmsl, rms2) , (48) 

where rmsl and rms2 are the rms distances defined in 
(47) from the reference configuration 1 and the reference 
configuration 2, respectively, and P{rmsl, rms2) is the 
(reweighted) probability at T = 250 K to find the peptide 
with values rmsl,rms2. The probability was calculated 
from the two-dimensional histogram of bin size 0.06 A x 
0.06 A. The contour lines were plotted every 2ksT (= 
0.99 kcal/mol for T = 250 K). 

Note that the reference configurations 1 and 2, which 
are respectively located at (rmsl,rms2) = (0,4.95) and 
(4.95,0), are not local minima in free energy at the fi- 
nite temperature (T = 250 K) because of the entropy 
contributions. The corresponding local-minimum states 
at Ai and Bi still have the characteristics of the refer- 
ence configurations in that they have backbone hydrogen 
bonds between Gly-2 and Met-5 and between Tyr-1 and 
Phe-4, respectively. We remark that we observe in fig- 
ure 11 another well-defined local minimum state around 
(rmsl,rms2) = (4.7,3.5). This state can also be consid- 
ered to correspond to configuration 2 because we again 
observe the backbone hydrogen bond between Tyr-1 and 
Phe-4. The side-chain structures are, however, more de- 
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FIG. 8. Left-hand-side ordinate: Specific heat re-weighted 
from a multicanonical (MUCA) and from a 300 K multi- 
-overlap (MUOV) simulation with reference configuration 1. 
Right-hand-side ordinate: ^ re-weighted from the same sim- 
ulations, where qi is the overlap with reference configura- 
tion 1. 



viated from configuration 2 than Bi, resulting in a larger 
value of rms2. 

The transition state C in figure 11 should have inter- 
mediate structure between configurations 1 and 2. In 
figure 12 we show a typical backbone structure of this 
transition state. We see the backbone hydrogen bond 
between Gly-2 and Phe-4. This is precisely the expected 
intermediate structure between configurations 1 and 2, 
because going from configuration 1 to configuration 2 we 
can follow the backbone hydrogen-bond rearrangements: 
The hydrogen bond between Gly-2 and Met-5 of config- 
uration 1 is broken, Gly-2 forms a hydrogen bond with 
Phe-4 (the transition state), this new hydrogen bond is 
broken, and finally Phc-4 forms a hydrogen bond with 
Tyr-1 (configuration 2). 

It is interesting to see in figure 11 that there is only one 
saddle point in the free-energy landscape that connects 
configurations 1 and 2. Hence, the transition between 
configurations 1 and 2 always passes through the state 
C. 

In Ref. [22] the low-energy conformations of Met- 
enkcphalin were studied in detail and they were classi- 
fied into several groups of similar structures based on the 
pattern of backbone hydorgen bonds. It was found there 
that below T = 300 K there are two dominant groups, 
which correspond to configurations 1 and 2 in the present 
article. Although much less conspicuous, the third most 
populated structure is indeed the group that is identified 
to be the transition state in the present work. 

In figures 13 and 14 we show the internal energy land- 
scape and the entropy landscape at T = 250 K, respec- 
tively. Here, the internal energy U is defined by the 
(reweighted) average ECEPP/2 potential energy: 
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FIG. 9. Combined weight simulation at T — 300 K: Proba- 
bility densities with respect to the distances di and ofe. 



U(rmsl, rms2) =< E(rmsl,rms2) > 



(49) 



Here, the average was again calculated from the two- 
dimensional histogram of bin size 0.06 Ax 0.06 A. The 
entropy S was then calculated by 

S(rmsl, rms2) = ^ [U(rmsl, rms2) — F(rmsl, rms2)\ . 

(50) 

The landscape in figure 14 is actually —TS(rmsl, rms2). 

Both internal energy and entropy landscapes are more 
rugged than free energy landscape (we observe much 
more number of contour lines in figures 13 and 14 than 
in figure 11). The internal energy has clear local min- 
ima at the points (rmsl,rms2) = (0,4.95) and (4.95,0), 
which respectively correspond to configurations 1 and 2, 
while the entropy landscape has local maxima at these 
points. These two terms tend to cancel each other, and 
the free energy landscape is smoothed out. 

In table IV we list the numerical values of the free 
energy, internal energy, and entropy multiplied by tem- 
perature at the two local-minimum states (Ai and Bi in 
figure 11) and the transition state (C in figure 11). The 
internal energy is just the average of the ECEPP/2 po- 
tential energy (without any shift of zero point). The free 
energy was normalized so that the value at Ai is zero. 
The values at the coordinates of reference configurations 
1 and 2, which are respectively referred to as Ao and Bo 
in the table, are also listed. 

Among the five points, Ao and Bo are unfavored in 
free energy mainly due to the large entropy effects, al- 
though they are energetically most favored. This means 
that at this temperature the exact conformations of the 
reference configurations 1 and 2 are not populated much. 
The relevant states are rather Ai, Bi, and C. The state 
Ai can be considered to be "deformed" configuration 1, 



FIG. 10. Probability density of the rms distance from the 
multi-overlap simulation at T = 400 K of figure 6, and its 
canonically re-weighted probability density. The abscissa is 
the rms distance (A) in Eq. (47) from the reference configu- 
ration 1. 



TABLE IV. Free energy, internal energy, entropy multi- 
plied by temperature at T = 250 K (all in kcal/mol) at the two 
local- minimum states (Ai and Bi) and the transition state 
(C) in figure 11. The values at the coordinates of reference 
configurations 1 and 2, which are respectively referred to as 
Ao and Bo, are also listed. The rms distances are in A. 



Coordinate (rmsl,rms2) 


F 


U 


-TS 


Ai (1.23, 4.83) 





-5.4 


5.4 


Bi (4.17, 2.43) 


1.0 


-3.5 


4.5 


C (3.09, 4.05) 


2.2 


-0.8 


3.0 


Ao (0.03, 4.95) 


15 


-10.5 


26 


Bo (4.95, 0.03) 


20 


-8.1 


28 



and Bi deformed configuration 2 due to the entropy ef- 
fects, whereas C is the transition state between Ai and 
Bi. Among these three points, the free energy F and 
the internal energy U are the lowest at Ai, while the en- 
tropy contribution —TS is the lowest at C. The free en- 
ergy difference AF, internal energy difference AU, and 
entropy contribution difference — TAS are 1.0 kcal/mol, 
1.9 kcal/mol, and —0.9 kcal/mol between Bi and Ai, 
2.2 kcal/mol, 4.6 kcal/mol, and —2.4 kcal/mol between 
C and Ai, and 1.2 kcal/mol, 2.7 kcal/mol, and —1.5 
kcal/mol between C and Bi. Hence, the internal energy 
contribution and the entropy contribution to free energy 
are opposite in sign and the magnitude of the former is 
roughly twice as that of the latter at this temperature. 
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FIG. 11. Free-energy landscape at T = 250 K with respect 
to rms distances (A) from the two reference configurations, 
F(rmsl,rms2). Contour lines are drawn every 2fcsT. The 
labels Ai and Bi indicate the positions for the local-minimum 
states at T = 250 K that originate from the reference config- 
uration 1 and the reference configuration 2, respectively. The 
label C stands for the saddle point that corresponds to the 
transition state. 



IV. SUMMARY AND CONCLUSIONS 

We have outlined an approach to perform MC simu- 
lations which yield the free-energy distribution between 
two reference configurations. The multi-overlap weights 
for this purpose were obtained by a novel, iterative pro- 
cess. The main point of this iterative process is not that 
it is supposed to be more efficient than the recursion that 
was used in the multi-self-overlap simulations of Ref. [12], 
but that it is an entirely independent approach, which 
starts from an analytically controlled limit. Recursions 
like the one used in [12] are not "foolproof". For in- 
stance, while most of the spin glass replica in Ref. [12] 
were well-behaved, a few did not complete their recur- 
sion after more than an entire year of single processor 
CPU time. Similar situations could be encountered in 
all-atom simulations of larger peptides, where the normal 
multicanonical weight recursion as well as similar multi- 
overlap weight recursion could fail. The present method 
provides then an alternative, approaching the physical 
region from a different limit. 

Noticeable, our multi-overlap approach is well-suited 
to be combined with a recently introduced, biased 
Metropolis sampling [30]. Namely, the required config- 



FIG. 12. The transition state between reference configura- 
tions 1 and 2. See the caption of figure 1 for details. 



urations at higher temperatures are as well necessary for 
our particular multi-overlap recursion, so that no extra 
simulations are required in this respect. 

On the physical side, we have found that entropy ef- 
fects are rather important for a small peptide. The ef- 
fects of entropy on the folding of real proteins in realistic 
solvent have yet to be studied in detail. 

We have also performed the analysis of this paper for 
Met-enkephalin with variable u> angles and, in particular, 
simulated with combined weights at a number of temper- 
atures. The results found are quite similar to those re- 
ported in this paper. In future work we intend to analyze 
the transition between reference configuration for larger 
systems of actual interest like /3-lactoglobulin. 
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